Conferences in Research and Practice in Information Technology
  

Online Version - Last Updated - 20 Jan 2012

 

 
Home
 

 
Procedures and Resources for Authors

 
Information and Resources for Volume Editors
 

 
Orders and Subscriptions
 

 
Published Articles

 
Upcoming Volumes
 

 
Contact Us
 

 
Useful External Links
 

 
CRPIT Site Search
 
    

Seed-Detective: A Novel Clustering Technique Using High Quality Seed for K-Means on Categorical and Numerical Attributes

Rahman , Md. A. and Islam, Md. Z.

    In this paper we present a novel clustering technique called Seed-Detective. It is a combination of modified versions of two existing techniques namely Ex-Detective and Simple K-Means. Seed-Detective first discovers a set of preliminary clusters using our modified Ex-Detective. The modified Ex-Detective allows a data miner to assign different weights (importance levels) for all attributes, both numerical and categorical. Centers of the preliminary clusters are then considered as initial seeds for the modified Simple K-Means, which unlike existing Simple K-Means does not randomly select the initial seeds. Centers of the preliminary clusters are naturally expected to be better quality seeds than the seeds that are chosen randomly. Having better quality initial seeds as input the modified Simple K-Means is expected to produce better quality clusters. We compare Seed-Detective with several existing techniques including Ex-Detective, Simple K- Means, Basic Farthest Point Heuristic (BFPH) and New Farthest Point Heuristic (NFPH) on two publicly available natural data sets. BFPH and NFPH were shown in the literature to be better than Simple K-Means. However, our initial experimental results indicate that Seed-Detective produces better clusters than other techniques, based on several evaluation criteria including F-measure, entropy and purity. Another contribution of this paper is the experimental result on Ex-Detective which was never tested before.
Cite as: Rahman , Md. A. and Islam, Md. Z. (2011). Seed-Detective: A Novel Clustering Technique Using High Quality Seed for K-Means on Categorical and Numerical Attributes. In Proc. Australasian Data Mining Conference (AusDM 11) Ballarat, Australia. CRPIT, 121. Vamplew, P., Stranieri, A., Ong, K.-L., Christen, P. and Kennedy, P. J. Eds., ACS. 211-220
pdf (from crpit.com) pdf (local if available) BibTeX EndNote GS